An Ensemble Method for Spelling Correction in Consumer Health Questions
نویسندگان
چکیده
Orthographic and grammatical errors are a common feature of informal texts written by lay people. Health-related questions asked by consumers are a case in point. Automatic interpretation of consumer health questions is hampered by such errors. In this paper, we propose a method that combines techniques based on edit distance and frequency counts with a contextual similarity-based method for detecting and correcting orthographic errors, including misspellings, word breaks, and punctuation errors. We evaluate our method on a set of spell-corrected questions extracted from the NLM collection of consumer health questions. Our method achieves a F1 score of 0.61, compared to an informed baseline of 0.29, achieved using ESpell, a spelling correction system developed for biomedical queries. Our results show that orthographic similarity is most relevant in spelling error correction in consumer health questions and that frequency and contextual information are complementary to orthographic features.
منابع مشابه
Context-Sensitive Spelling Correction of Consumer-Generated Content on Health Care
BACKGROUND Consumer-generated content, such as postings on social media websites, can serve as an ideal source of information for studying health care from a consumer's perspective. However, consumer-generated content on health care topics often contains spelling errors, which, if not corrected, will be obstacles for downstream computer-based text analysis. OBJECTIVE In this study, we propose...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملPhonetic based sentence level rewriting of questions typed by dyslexic spellers
This paper introduces a method combining written correction and phonetic interpretation in order to automatically rewrite sentences typed by dyslexic spellers. The method uses a finite state automata framework. Dysorthographics refers to incorrect word segmentation which usually causes classical spelling correctors fail. Our approach differs from spelling correction in that we aim to use severa...
متن کاملThe Impact of Correction for Guessing Formula on MC and Yes/No Vocabulary Tests' Scores
A standard correction for random guessing (cfg) formula on multiple-choice and Yes/Noexaminations was examined retrospectively in the scores of the intermediate female EFL learners in an English language school. The correctionwas a weighting formula for points awarded for correct answers,incorrect answers, and unanswered questions so that the expectedvalue of the increase in test score due to g...
متن کاملPronunciation Modeling for Improved Spelling Correction
This paper presents a method for incorporating word pronunciation information in a noisy channel model for spelling correction. The proposed method builds an explicit error model for word pronunciations. By modeling pronunciation similarities between words we achieve a substantial performance improvement over the previous best performing models for spelling correction.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- AMIA ... Annual Symposium proceedings. AMIA Symposium
دوره 2015 شماره
صفحات -
تاریخ انتشار 2015